arXiv:l 502.04977V 1 [math.ST] 17 Feb 2015 


PRECISE ERROR ANALYSIS OF THE G-LASSO 


Christos Thrampoulidis* Ashkan Panahi', Daniel Guo*, Babak Hassibi* 

* Department of Electrical Engeeniring, Caltech, Pasadena, USA 
f Signal Processing Group, Chalmers University of Technology, Gothenburg, Sweden 


ABSTRACT 

A classical problem that arises in numerous signal processing 
applications asks for the reconstruction of an unknown, k- 
sparse signal x 0 £ R'" from underdetermined, noisy, linear 
measurements y = Axo + z £ R m . One standard approach 
is to solve the following convex program x = argmin x ||y — 
Ax|j 2 +A||x||i, which is known as the G-LASSO. We assume 
that the entries of the sensing matrix A and of the noise vector 
z are i.i.d Gaussian with variances 1/m and a 2 . In the large 
system limit when the problem dimensions grow to infinity, 
but in constant rates, we precisely characterize the limiting 
behavior of the normalized squared error ||x — Xo|||/cr 2 . Our 
numerical illustrations validate our theoretical predictions. 

Index Terms — LASSO, square-root LASSO, normalized 
squared error, sparse recovery, Gaussian min-max theorem 

1. INTRODUCTION 

1.1. Motivation 

The Least Absolute Shrinkage and Selection Operator (LASSO) 
is a celebrated convex progam used to estimate sparse signals 
from noisy linear underdetermined observations. Given a 
vector of observations y = Axo + z £ R m of an unknown, 
but k-sparse (i.e., at most k nonzero entries), signal Xo £ R”, 
the /: 2 -LASSO 1 produces the following estimate for x 0 : 

x := argmin <f>(x; A, z) := ||y - Ax|| 2 + -2=||x||i. (1) 

x sjm 

Here, A £ R mxn is the sensing matrix, z £ R m is the noise 
vector and A > 0 is a regularizer parameter. The LASSO has 
been long investigated from different perspectives and shown 
to enjoy unique properties, in terms of both computation and 
precision. Yet, some important asymptotic properties of it 
have not yet been fully understood. Our interest is on the 
exact characterization of the reconstruction error ||x — x 0 11 2 - 

1.2. Contribution 

We assume a generic setup in which the entries of the sens¬ 
ing matrix and the non-zero entries of xo are i.i.d Gaussian. 
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'Also known as “square-root LASSO” [1]. Please refer to [2, Sec. 1.3]. 


Also, the noise vector z is assumed to have entries i.i.d Gaus¬ 
sian with variance cr 2 . Under this assumption we derive an 
asymptotically exact expression for the normalized squared 
error (NSE) ||x — xo|||/tr 2 of the GL-ASSO. In the low 
noise regime a 2 —> 0, our result reduces to simple and inter¬ 
pretable formulae. Although our theoretical analysis requires 
an asymptotic setting in which the problem dimensions grow 
to infinity, our numerical illustrations suggest that the predic¬ 
tions be accurate already with problem dimensions ranging 
over only a few hundreds. Also, we remark on our assumption 
on the Gaussian nature of the sensing matrix. This assump¬ 
tion has a long tradition in the statistics literature and sheds 
important insights [3]; the Gaussian ensemble has wonderful 
properties which make the analysis tractable, while at the 
same time many of the results generalize to a wider class of 
distributions. The main technical tool used in our analysis is 
a gaussian comparison inequality due to Gordon [4], When 
combined with appropriate convexity assumptions, the in¬ 
equality can be shown to be tight [2, 5, 6, 7], which makes it 
ideal for our precise analysis. 

1.3. Relevant Literature 

The LASSO was introduced by Tibshirani in [8] in the form 
x = argmin ||y - Ax|| 2 s.t. ||x|| x < ||x 0 ||i. (2) 

X 

(1) is a regularized version of (2). An alternative to (1) solves 
x = argmin ||y - Ax||a +1-11x11!. (3) 

X 

Both this and (1) are variations of the same algorithm and La¬ 
grange duality ensures that they both become equivalent to 
the constrained optimization (2) for proper choice of the reg¬ 
ularizer parameters. There are reasons to argue in favor of 
either of them [1,9], but they go beyond the scope of this pa¬ 
per. Early well-known bounds on the reconstruction error of 
the LASSO were order-wise in nature (i.e. accurate only up 
to constant multiplicative factors) and derived based on RIP 
and Restricted Eigenvalue assumptions on the measurement 
matrix [1, 10, 11, 12, 13]. To the best of our knowledge, the 
first precise asymptotic evaluation of the limiting behavior of 
the LASSO reconstruction error is due to Bayati and Mon- 
tanari [3]; they consider problem (3), i.i.d Gaussian sensing 
matrix A and use the Approximate Message Passing (AMP) 
framework. More recently, and closer to our work, Stojnic 



introduced an alternative framework that cleverly uses a cele¬ 
brated gaussian comparison inequality due to Gordon [4]; he 
applied this on (2) and obtained precise characterizations for 
its worst-case NSE, which he showed to occur in the limit 
C7 2 0 . The same framework was used in [2] to generalize 

the results of [14] to arbitrary convex regularizer functions. 
Moreover, the authors in [2] consider (1) and obtain simple, 
precise error formulae when a 1 —> 0. Our work, extends this 
result of [2] in several directions for the case of sparse recov¬ 
ery. First, it holds for arbitrary values of the noise-variance 
a 2 ; to our knowledge, this is the first such precise asymptotic 
result for (1) ([3] considers (3)), and suggests the capabili¬ 
ties of the framework used. Also, when cr 2 —> 0, it recovers 
the result of [2] and extends it on the range of values of A 
for which it holds. Finally, we believe that our result can be 
used to prove that the worst-case NSE of (1) occurs when 
a 2 0. This would imply that the simple and interpretable 
expressions corresponding to that regime are tight bounds on 
the NSE for arbitrary noise levels. Some additional effort is 
required to prove this claim and is, thus, left for future work. 
2. ANALYSIS 

2.1. Problem Setup 

For the rest of the paper, let A cr 2 ) denote the normal dis¬ 
tribution of mean g and variance a 2 . Also, we use || • || in¬ 
stead of || • || 2 - Let Xo £ R” denote the unknown signal and 
y = Ax 0 + z £ M"' denote the vector of observations. We 
make the following assumptions: 

• the entries of A are i.i.d Af( 0,1/m), 

• the entries of z are i.i.d Af(0, a 2 ), 

• xo is ^-sparse with fixed support set S, i.e. (xo)i = 0 
for all i £ S. Also, (x 0 )i ~ d A/"(0,1), 

Consider estimating xo via the solution x := x(A) of the 
LASSO program in (1). Our goal is to provide tight expres¬ 
sions for the reconstruction error ||x — xo||; this depends ex¬ 
plicitly on x 0 and implicitly on A and z. We consider an 
asymptotic setting in which the problem parameters n, m and 
k grow proportionally as m/n —> 6 £ ( 0 , 1 ) 2 and k/m —> 
7 £ (0,1). Our result characterizes the limiting behavior 
of the quantity of interest as n —> 00 . To suppress nota¬ 
tion, we use the symbol “ss” to denote convergence in prob¬ 
ability, i.e. X n k T is used to denote that a sequence of 
random variables X n converges in X in that for all e > 0, 
linin-^oo P (|X n — X\ > eX) = 0. Similarly, we write X n > 
X if for all e > 0, lim,,^ P (X n < (1 - e)X) = 0. 

2.2. Preliminaries 

It is convenient to rewrite (1) changing the optimization vari¬ 
able to the error vector w : : x — x () : 3 : 

min4>(w; A, z) := y/m\\z — Awjl + A||x 0 + w||i. (4) 

W 

2 Our analysis extends to the overdetermined case, where 5 E [1, oo). For 
simplicity, in this paper we focus on the underdetermined regime. 

3 Also, note the re-scaling in (4) with a factor of yfrn. 


Denote w A,cr (A, z) the solution of (4). We often drop the de¬ 
pendence on the arguments A, cr, A, z when clear from con¬ 
text. Suppose we want to show that ||w|| sa a*, for appro¬ 
priate a* > 0. Equivalently, for all e > 0 and sets 'R f := 
{( | \t— a*| > ea*}, we wish that lim^oo P(||w|| £ R t ) = 
0. It suffices to prove the existence of <5 := 6(e) > 0 such that 


lim P min <E>(w) < (1 + J)T>(w) ) = 0. 

n—roo \J| w ||eTC e ) 


(5) 


Directly showing (5) through analyzing $(w) is difficult. In¬ 
stead, we use the Gaussian min-max Theorem to translate 
the LASSO objective function in (4) to a simpler one, that 
is amenable to direct analysis. 


2.3. Introducing a Simpler Optimization 


The Gaussian min-max Theorem belongs to the family of the 
so called Gaussian comparison inequalities [15, Ch. 3] and 
was proved by Gordon in [4, Lemma 3.1]. To see how that 
theorem can be applied in our case, observe that 


$(w) = max u 2 [v/mA, — z/ct] 
IH|<i 



+ A||x 0 +w||i. 


Let g £ R m , h £ R", g £ R have i.i.d Af( 0,1) entries and 

</>(w;g, h) := max | v^ll w ll 2 + m<j2 S 1 u 

- l|y||( hTw + Vrnag) + A||w + x 0 ||i|. ( 6 ) 

Then, the Gaussian min-max Theorem 4 states that for arbi¬ 
trary set KcR" and c £ R: 


P(min $(w; A, z) < c) < 2P(min </>(w; g, h) < c). (7) 

wE7£ wE7£ 

Recently, it was shown in [7, 14] that when combined with ap¬ 
propriate convexity assumptions, the Gaussian min-max The¬ 
orem is tight. In particular, for a convex compact set TZ it is 
also true [7, Theorem II. 1] that 

P(min $(w; A, z) > c) < 2P(min <j>( w; g, h) > c). ( 8 ) 

wE7£ wE7£ 

(7) and ( 8 ) are critical for establishing (5). They suggest the 
analysis of the following optimization problem 5 , which we 
refer to as “Gordon’s optimization (GO)”: 

(GO) 0(w*;g,h) := min^(w;g,h), (9) 


in place of the original LASSO optimization in (4). To see 
how this can be useful, assume that « I?*; then, it 

would follow directly from (7) and ( 8 ) that d>(w) ~ X>*. 
Next, we analyze such as ymptotic properties of (GO). 

4 In fact, the statement in (7) is a slight variation of the original statement 
of the Gaussian min-max Theorem. Please refer to [7] for details. 

5 The proof of (8) in [7] requires the set 7 Z to be compact. This technical 
detail can be resolved by assuming a sufficiently large upper bound K on 
||w|| such that constraining the minimization in (9) over the set ||w|| < K 
does not change the optimal cost. 







( 10 ) 


D(A) := fc(l + A 2 ) + (n- k)p( 1, A), C(A) := -(A/2)<9D(A)/dA, 0(A) := A/Q" 1 (1/2 + (m - D(A) - C(A))/(2fe)), 

/(A, </>):= m- D(A) + mcr 2 (1 - <j> 2 ) + k (0 2 - A 2 - 2) (2Q (A/0) - 1) + fcy^/VA^exp (-A 2 /(20 2 )) . 


2.4. Analyzing Gordon’s Optimization 

2.4.1. Scalarization 

We begin with simplifying (GO) and reducing it to an opti¬ 
mization problem involving only scalars. For now, assume 
that g, h and g are all fixed. Note that the maximization over 
the direction of u in ( 6 ) is easy to evaluate. Furthermore, we 
may write ||xo+w||i = max|i v |i <i(xo+w) T v. With these: 

0(w*) = min max \ \/||w || 2 + mcr 2 ||g||/3 (11) 

w 0</3<l f 

IM|oo<l 

— /3(h — Av) t w — pyfrnag + Axq vj. 

The objective function in (11) is convex-concave in w and 
/?, v. Also, the constraint sets are closed convex and one of 
them is bounded; thus, we can switch the correspoding order 
of min-max [16, Cor. 37.3.2], After this, the minimization 
over the direction of w is easy to evaluate: 

</>(w*) = max min < \/a 2 + mcr 2 1|g|| 2 /? — PV^ngcr 

0</3<l ck> 0 l 
IM|oo<l 

- a||Av - /3h || 2 + Axg v j. 


Note that the optimization variable a above plays the role of 
the ^ 2 -norm of w; its optimal value a* is equal to ||w* H 2 - 
Switching one last time the order of optimization between 
v, P and a, we conclude with the following optimization: 


min max 

a>0 0</3<l 


|y/a 2 +mcr 2 ||g|| 2 /3 - firjmgo - ftp {a, ft)} , 


a|| 4 v-h|| 2 - A xJv 


where T(a,p) := min^u^! { 

The final step amounts to reducing the optimization over v to 
a scalar optimization. The idea here is to apply the fact that 


}■ 


in order 


for any real r > 0 , one has ypr = min p>0 ff + i 
to make the objective function separable. With this: 
xv a\ ■ ■ a P , °ll A //3v - h||l A T 

J 7 (a,p)= mm mm — H--- -x n v 

l|v||oc<i p>o 2 2 p P 0 


= mm 

p> 0 


ap 

2 


£[ 

ies 


a 

— mm 

2 P |vi|<# 


h, 


p(xo)i 


-hi(x 0 )j Ttt-V' min (vj-h ,) 2 


2 a 


2 P 


i<£S 


|vd<7 


( 12 ) 


The second equality above follows from standard comple¬ 
tion of squares. The scalar minimizations over v,’s in (12) 
are simple soft-thresholding operations: min| g |< T (r — q ) 2 = 
((|r| — r) + ) 2 . Combining all the above, we have shown that 
</>(w*) = min a > 0 max 0 < / 8 <i lP >o c/r 0 (ot, P,p), for an appro¬ 
priately defined objective function 4> 0 (a, fi,p). 


2.4.2. Concentration 

Our next step is to analyze the limiting behavior of (j> 0 (a, P, p). 
Recall that g,h, g have i.i.d 1) entries. Thus, ||g|| ss 
y/rn and g ~ 0. Using these and applying the Law of Large 
Numbers to the summations in (12) it can be shown that 

c/> 0 (a,P,p) « V(a,p,p), where 

T>(a,P,p) := p(\/a 2 + ma 2 y/m — °^- + k~ 

V 2 2a 

k^p(^l+p 2 /a 2 , A IP) -(n-k) gp( 1, X/P )), 

p(c,r) := E g ^ (0il) [(| eg\ -r) + ] 2 

= 2(c 2 + t 2 )Q(t/c) - \j2/TrcTe~ T /(2c \ (13) 

and Q(x) = {l/s/2t r) J' x e~ x2//2 dx. Albeit some technical¬ 
ities involved (skipped due to space limitations): 


min 

w 

</>(w) « 

min max Uta.B.p ), 

«>OO<0<1 

p>0 

(14a) 

min 

w||£7£ e 

</>(w) « 

min max T>(a. B.p). 

aen € o<p<i 

p> 0 

(14b) 

= {f| 

\t - CK* 

> ect*} and a* minimizer in (14a). 


2.5. Back from Gordon’s Optimization to the LASSO 

Once we have analyzed (GO), we may now appropriately ap¬ 
ply the Gaussian min-max Theorem (in particular, [7, Thm. 
II. 1]) to conclude with the following result. 

Theorem 2.1 (LASSO Objective). Recall the definitions 
o/$(w), T>(ot,P,p) in (4) and (13), respectively. Also, let 
:= min w 4>(w) andV* := min Q > 0 max 0 </ 3 <i T>(a, P,p) 

p> 0 

Then, for all e > 0, lim „_ i , 00 P(|4>* — P*| > eV„) = 0. 

The proof follows from (14a) when combining (7) and ( 8 ). 
Also, a single application of (7) in (14b) shows that for any 

R e C R: 

:= min 4>(w) > min max T>(a,P,p) =:T> f . (15) 
||w||eTC e ae-R e o</3<i v ' 

p> 0 

To appreciate the power of Theorem 2.1 , note that it gives 
an explicit evaluation of the limiting behavior of the LASSO 
optimal cost in terms of the optimal cost of a much simpler, 
scalar and deterministic optimization problem. What is more, 
we can combine Theorem 2.1 with (15), to prove the follow¬ 
ing result about the limiting behavior of the LASSO error. 

Theorem 2.2 (LASSO error). Let x he a minimizer of the 
LASSO in (1) and recall the definition of function T>(a,P,p) 
(13). If a* is optimal for min a >omaxo<i 3 <iV (a, P,p), then 

p> 0 

for all e > 0: lim^oo P(|||x - x 0 || 2 - a*| > ect*) = 0. 












We provide a sketch of the proof here (see also [7, 
Thm. II. 1]). Consider the event 

£ = {$* < (1 + Ci)®. and $ e > (1 - C 2 )®J. 

In view of Thm. 2.1 and (15): linin^oo P(£) = 1. Hence, 
lim P ($ c < (1 + < lim P ($ e < (1 + \£) 

n—f oo n—too 

< P ( (1 - ( 2 )D e < (1 + 5)(1 + Ci)®* ) • (16) 

Next, we show that we can choose Ci, C 2 , <5 such that the (de¬ 
terministic) event in (16) does not occur; this will complete 
the proof of (5). It can be shown that the function 27(-, 0,P) 
is strictly convex for all /?, p. Thus, 3 constant L > 0 : 

V e - 27* =: 2?(a e , /3 e ,p e ) - 27(a*, /3*,p*) = 

D(a e , 0e,p e ) - V(a e , /)*,p*) + 27(cr e , /)*,p*) - 2?(a*, /3*,p*) 

> 0 + L|a e -a*| 2 > e 2 La 2 . (17) 

Denote 9 = 9(e) := e 2 La 2 /V*. Set Ci = 0/4, C 2 = 

S = -^rji and apply (17) to complete the proof. 


3.2. Asymptotic NSE 

Here we consider the case a 2 —> 0. 


Theorem 3.2. Let A := max{A, A/ r(/ }. If m > 77(A), then 
the following limit holds in probability 


lim lim 

n —>00 <7 —^0 


x A ’° - X0II2 


D(A) 


m — D(A) 


When A > A cl i t , Theorem 3.2 recovers the result of 
[2, Theorem 3.2]; adding to this, we are able to character¬ 
ize the behavior when A < A cl -i t . The theorem suggests 
that stable recovery is possible only when m > 27(A). In 
particular, we require that at least m > min^>oD(A). 
A recent line of work [17, 18, 19, 20], has shown that 
niiiiAxi D(A) precisely characterizes the minimum num¬ 
ber m of required measurements for exact recovery of sparse 
signals under noiseless linear measurements. Also, the min¬ 
imum of the formula that appears in the theorem is achieved 
at A best ■= argminAx) D(A). We refer the reader to Figure 
1 for an illustration of A° rit , Ab es t and to [2, Section 4.2] for a 
detailed further discussion. 


2.6. Simplifying the Result 

For any values of A and < 7 , we ask for an accurate prediction of 
the LASSO error ||x A ’ cr — xo||. Theorem 2.2 yields an answer 
as the solution of a minimax optimization, which is scalar, de¬ 
terministic and convex. We have put some additional effort in 
order to simplify this optimization and make it more explicit. 
In particular, we can show that it has a unique global opti¬ 
mal solution, the evaluation of which essentially breaks down 
to solving two nonlinear one-dimensional algebraic equations 
(see Theorem 3.1). Apart from the computational advantage 
of this alternative description, it also allows for further theo¬ 
retical insights. For instance, starting from Theorem 3.1 we 
believe that it is possible to prove that the worst-case NSE is 
attained in the limit of the noise variance a 2 —> 0 . 


3. RESULTS 


3.1. Arbitrary SNR Values 

For the statement of our main result, recall (13) and further 
consider the definitions in (10). We also need the following. 
Definition 3.1 (A/ rit ). Define \° crit as the unique solution 
of the equation y) = 0. Further, define := 



It can be shown that for all cr > 0: A^(" < A(T rit < A° ril 


Theorem 3.1. Let x A,cr be the solution of the LASSO in (1). 
Recall the definitions of f and A/„-, in (10) and Definition 3.1. 
Let A := max{A, A/„ Y } and g*(a) = J 1 + De¬ 


note a* the unique solution of the equation f ^A, q*(a)j = 0 
with respect to a. Then, the following limit holds in probabil¬ 
ity: lim^oo l|x a ^ oh = a*- 


4. SIMULATION RESULTS AND CONCLUSION 



(a) optimal cost 



(b) normalized error 

Fig. 1 : n = 500, m = 150, k = 20. Averages over 20 realizations. 

In Figure 1, observe the close agreement of the simulation 
results to the predictions of Theorems 3.1 and 3.2 (for the c ase 
cr 2 = 10~ 4 , we used Theorem 3.2 for the prediction). Refer to 
Section 3.2 for the definitions of A/ rit , A best', A max is such that 
to < D(A) for all A > A max - Our expressions are accurate 
for problem sizes on the order of a few hundreds and valid for 
all values of o 2 . In Figure lb, the worst-case NSE occurs in 
the small noise-variance regime. We believe that this claim 
can be proved using Theorems 3.1 and 3.2. 


= a, 
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